云数据中心的数字和大小都在成倍增长。这种增加导致网络活动激增,可以更好地避免交通拥堵。最终的挑战是两个方面:(i)设计算法,可以对给定数据中心的复杂流量模式进行定制;但是,与此同时(ii)在低级硬件上运行,具有有效拥塞控制(CC)所需的低潜伏期。在这项工作中,我们提出了一个基于强化学习(RL)的CC解决方案,该解决方案从某些交通情况中学习并成功地将其推广到他人。然后,我们将RL神经网络政策提炼成二进制决策树,以实现与RDMA实时推断所需的$ \ mu $ sec决策延迟。我们在真实网络中部署了NVIDIA NIC的蒸馏政策,并展示了最先进的性能,同时平衡所有测试的指标:带宽,延迟,公平和数据包下降。
translated by 谷歌翻译
Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method.
translated by 谷歌翻译
Purpose: Trans-oral robotic surgery (TORS) using the da Vinci surgical robot is a new minimally-invasive surgery method to treat oropharyngeal tumors, but it is a challenging operation. Augmented reality (AR) based on intra-operative ultrasound (US) has the potential to enhance the visualization of the anatomy and cancerous tumors to provide additional tools for decision-making in surgery. Methods: We propose and carry out preliminary evaluations of a US-guided AR system for TORS, with the transducer placed on the neck for a transcervical view. Firstly, we perform a novel MRI-transcervical 3D US registration study. Secondly, we develop a US-robot calibration method with an optical tracker and an AR system to display the anatomy mesh model in the real-time endoscope images inside the surgeon console. Results: Our AR system reaches a mean projection error of 26.81 and 27.85 pixels for the projection from the US to stereo cameras in a water bath experiment. The average target registration error for MRI to 3D US is 8.90 mm for the 3D US transducer and 5.85 mm for freehand 3D US, and the average distance between the vessel centerlines is 2.32 mm. Conclusion: We demonstrate the first proof-of-concept transcervical US-guided AR system for TORS and the feasibility of trans-cervical 3D US-MRI registration. Our results show that trans-cervical 3D US is a promising technique for TORS image guidance.
translated by 谷歌翻译
The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their behavior. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by evaluating them on a collection of classical and state-of-the-art metrics. Our measures revealed that recently-developed metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.
translated by 谷歌翻译
The task of topical segmentation is well studied, but previous work has mostly addressed it in the context of structured, well-defined segments, such as segmentation into paragraphs, chapters, or segmenting text that originated from multiple sources. We tackle the task of segmenting running (spoken) narratives, which poses hitherto unaddressed challenges. As a test case, we address Holocaust survivor testimonies, given in English. Other than the importance of studying these testimonies for Holocaust research, we argue that they provide an interesting test case for topical segmentation, due to their unstructured surface level, relative abundance (tens of thousands of such testimonies were collected), and the relatively confined domain that they cover. We hypothesize that boundary points between segments correspond to low mutual information between the sentences proceeding and following the boundary. Based on this hypothesis, we explore a range of algorithmic approaches to the task, building on previous work on segmentation that uses generative Bayesian modeling and state-of-the-art neural machinery. Compared to manually annotated references, we find that the developed approaches show considerable improvements over previous work.
translated by 谷歌翻译
标准扩散模型涉及图像变换 - 添加高斯噪声 - 以及逆转此降解的图像恢复操作员。我们观察到,扩散模型的生成行为并不是很大程度上取决于图像降解的选择,实际上,可以通过改变这种选择来构建整个生成模型家族。即使使用完全确定性的降解(例如,模糊,掩蔽等),培训和测试时间更新规则是基于扩散模型的培训和测试时间更新规则,可以轻松地概括为创建生成模型。这些完全确定的模型的成功使社区对扩散模型的理解质疑,这依赖于梯度Langevin动力学或变异推理中的噪声,并为反转任意过程的广义扩散模型铺平了道路。我们的代码可从https://github.com/arpitbansal297/cold-diffusion-models获得
translated by 谷歌翻译
我们提出了一个简明的视频表示,该视频将感知有意义的功能编码为图。通过这种表示,我们旨在利用视频中的大量冗余并节省计算。首先,我们通过将Superpixel视为图形节点并在相邻的Superpixels之间创建空间和时间连接来构建视频的超级像素图表示。然后,我们利用图形卷积网络来处理此表示形式并预测所需的输出。结果,我们能够使用更少的参数训练模型,这转化为简短的培训期和计算资源要求的减少。一项关于公开可用数据集动力学-400和Charades的全面实验研究表明,该提出的方法具有很高的成本效益,并且在培训和推理过程中使用有限的商品硬件。它减少了计算要求10倍,同时获得与最先进方法相当的结果。我们认为,提出的方法是一个有希望的方向,可以为更有效地解决视频理解打开大门,并使更多的资源用户能够在该研究领域蓬勃发展。
translated by 谷歌翻译
基于机器学习(ML)的系统的制作需要在其生命周期中进行统计控制。仔细量化业务需求和识别影响业务需求的关键因素降低了项目故障的风险。业务需求的量化导致随机变量的定义,表示通过统计实验需要分析的系统关键性能指标。此外,可提供的培训和实验结果产生影响系统的设计。开发系统后,测试并不断监控,以确保其符合其业务需求。这是通过持续应用统计实验来分析和控制关键绩效指标来完成的。本书教授制作和开发基于ML的系统的艺术。它倡导“首先”方法,强调从项目生命周期开始定义统计实验的需要。它还详细讨论了如何在整个生命周期中对基于ML的系统进行统计控制。
translated by 谷歌翻译
图形神经网络(GNNS)已成为与图形相关任务的高度成功的工具。然而,现实世界问题涉及非常大的图表,并且将GNNS所需的计算资源迅速增长。此外,实际图的嘈杂性质和大小导致GNNS如果不正常化,则会过度适合。令人惊讶的是,最近的作品表明,大图通常涉及许多可以消除的冗余组件,而不会影响太多性能。这包括通过GNN层或作为缩小输入图的预处理步骤的推理期间节点或边缘去除。这种有趣现象使得能够开发高效和准确的最先进的GNN。在本文中,我们进一步迈出了逐步逐步揭示这种现象,并提出一种称为地区敏感修剪(LSP)的系统方法,用于基于位置敏感散列的曲线图。我们的目标是缩小图形,使原始图的类似本地环境导致生成的稀疏图中的类似环境,这是与图形相关任务的重要特征。为了证明基于本地图形属性的修剪应用,我们举例说明了基于各种场景中的其他修剪策略应用修剪的优势。关于合成和现实世界数据集的广泛实验证明了LSP的优越性,从大图中除去大量边缘而不会影响性能,伴随着相当大的加速度。
translated by 谷歌翻译
API经济是指API(高级编程界面)微猎犬的广泛集成,软件应用程序可以相互通信,作为业务模型和功能的重要元素。可以使用这种系统的可能方式的数量是巨大的。因此,希望监视使用模式并识别系统以以前从未使用的方式使用。这为系统分析师提供了警告,并且可以确保系统不间断运行。在这项工作中,我们分析了API使用量的直方图和呼叫图,以确定系统的使用模式是否已移位。我们比较非参数统计和贝叶斯顺序分析对问题的应用。这是以一种克服重复统计测试问题的方式完成,并确保警报的统计显着性。该技术被模拟和测试,并证明有效地检测各种场景的漂移。我们还提到了对技术的修改来减少其存储器,以便在监控开始时发生在分布漂移时可以更快地响应。
translated by 谷歌翻译